182
Bibliography
for Computational Linguistics and the 12th International Joint Conference on Natural
Language Processing, pages 102–108, 2022.
[41] De Cheng, Yihong Gong, Sanping Zhou, Jinjun Wang, and Nanning Zheng. Person
re-identification by multi-channel parts-based cnn with improved triplet loss function.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
pages 1335–1344, 2016.
[42] Brian Chmiel, Liad Ben-Uri, Moran Shkolnik, Elad Hoffer, Ron Banner, and Daniel
Soudry. Neural gradients are near-lognormal: improved quantized and sparse training.
arXiv preprint arXiv:2006.08173, 2020.
[43] Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijay-
alakshmi Srinivasan, and Kailash Gopalakrishnan. Pact: Parameterized clipping ac-
tivation for quantized neural networks. arXiv preprint arXiv:1805.06085, 2018.
[44] Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul
Choo. Stargan: Unified generative adversarial networks for multi-domain image-to-
image translation. In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pages 8789–8797, 2018.
[45] Kevin Clark, Urvashi Khandelwal, Omer Levy, and Christopher D Manning. What
does bert look at? An analysis of bert’s attention. arXiv preprint arXiv:1906.04341,
2019.
[46] Benoˆıt Colson, Patrice Marcotte, and Gilles Savard. An overview of bilevel optimiza-
tion. Annals of operations research, 153(1):235–256, 2007.
[47] Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. Training deep neural
networks with low precision multiplications. arXiv preprint arXiv:1412.7024, 2014.
[48] Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. Binaryconnect: Train-
ing deep neural networks with binary weights during propagations. Advances in neural
information processing systems, 28, 2015.
[49] Richard Crandall and Carl Pomerance. Prime numbers. Springer, 2001.
[50] M. Dehghani, S. Gouws, O. Vinyals, J. Uszkoreit, and L. Kaiser. Universal transform-
ers. In International Conference on Learning Representations, 2019.
[51] Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, and Lukasz
Kaiser. Universal transformers. arXiv preprint arXiv:1807.03819, 2018.
[52] Alessio Del Bue, Joao Xavier, Lourdes Agapito, and Marco Paladini. Bilinear modeling
via augmented lagrange multipliers (balm). IEEE transactions on pattern analysis and
machine intelligence, 34(8):1496–1508, 2011.
[53] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A
large-scale hierarchical image database. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pages 248–255, 2009.
[54] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova.
Bert: Pre-
training of deep bidirectional transformers for language understanding. arXiv preprint
arXiv:1810.04805, 2018.